A Statistical Social Network Model for Consumption Data in 

Food Webs 



Grace S. ChitQand Anton H. Westvelcfl 



Abstract 

We adapt existing statistical modelling techniques for social networks to study consumption data observed 
in food webs. These data describe the feeding among organisms grouped into nodes that form the food web. 
Model complexity arises due to the extensive amount of zeros in the data, as each node in the web is predator 
/ prey to only a small number of other nodes. Many of the zeros are regarded as structural (non-random) 
in the context of feeding behaviour. The presence of "bottom prey" and "top predator" species (those who 
never consume and those who are never consumed, with probability 1) creates additional complexity to the 
modelling framework. We develop a special statistical social network model to account for such network 
features. The model is applied to a well-studied food web, for which the population size of seals is of 
concern to various commercial fisheries. 
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1 Why social networks? 



A food web is a network of organisms. When the relationship among them is of interest, organisms 
are typically aggregated at various resolutions to form nodes. For example, one node may consist of 
various squid species, while another may consist of the single species Argyrosomus hololepidotus 
(kob). In the trophic context, we are interested in the feeding relations among the nodes in the food 
web. Consider the pair of nodes (i, j). The within-pair predatory relation can be depicted as one 
of the following: 

i i i-)- j j j (1) 

where, conventionally, any link / arrow points from prey to predator. From left to right, the depic- 
tions in ([T]) respectively represent no predation between i and j, predation of i by j but not vice 
versa, predation of j by i but not vice versa, and mutual predation between i and j. To represent 
([T]) in a quantitative framework, the absence of a link is referred to as a zero link, so that "i j" 
consists of two zero links, each of "z — )■ j" and "i ^ j" consists of a zero and a non-zero link, and 
"z o j" consists of two non-zero links. Since these links are directed, each pair yields two 
directed links: from ito j, and from j to i. Extending this to all n nodes in the food web, we have 
a network that consists of 2x(r?,-choose-2) = n(n — 1) pairwise or dyadic directed links. 

Research on relational patterns in a network of nodes arises in many practical settings, most 
commonly in the social sciences, e.g. friendship, international trade. In such contexts, networks are 
referred to as social networks. We apply the same nomenclature to studying relational patterns in a 
food web with respect to predation. Various quantitative social network analysis (SNA) techniques 
have been developed to understand network relational patterns (e.g. Mucha etal, 2010; Wasserman 
& Faust, 1994) and adopted in food web research (e.g. Dambacher et ah, in press; Krause et ah, 
2003); these SNA methods are based largely on the mathematical notion of equivalence class for 
defining congruence among individual elements in a given set according to certain criteria (see, 
e.g. Diintsch and Gediga, 2000). The objective of this type of SNA is to seek optimal partitions of 
the network into compartments of nodes subject to the given criteria. For example, compartments 
identified in a food web may correspond to trophic levels. 

Recently, statistical regression methodologies have been developed to express network links 
as the random response of within-node and intemode characteristics (Westveld & Hoff, in press; 
Hoff, 2005; Gill & Swartz, 2001). The resulting conclusions about network features are purely 
empirical, based entirely on observed network attributes without the use of network dynamics the- 
ory or subject-matter theory. Chiu & Westveld (2010) demonstrate that, in the context of food 
webs, statistical SNA techniques can provide an alternative perspective of trophic relational pat- 
terns according to feeding activity and preference. These authors apply the statistical SNA mod- 
elling framework by Ward et al. (2007) to regress the presence-absence of pairwise predation, in 
a mixed-effects logistic model, on the dyadic node characteristic of phylogenetic similarity; eight 
food webs are analyzed this way. 

The basis of a statistical model for SNA is a two-way analysis-of-(co)variance (ANO(CO)VA). 
To see this, consider the mixed model in Chiu & Westveld (2010) with one optional covariate: 

log = /So + l^lXij + Si + Tj + u[Vj + Eij , j (2) 

i Pij 
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where Pij is the probability that j predates the phylogenetic similarity between 

i and j, Si is the iih. random sender (prey) effect, is the jth random receiver (predator) effect, 
u[vj is the random interaction between i and j expressed as an inner product of A;-dimensional 
vectors Ui and Vj, and is the remaining random component unattributable to Xij, Si, Vj, Ui, or 
Vj. All of Si, Tj, Ui, Vj, and £ij are mean-zero Gaussian random errors. (Note that expressing the 
interaction term as the inner product of latent vectors Ui and vj is due to Hoff et al, 2002. The 
latent u- and t>-spaces are abstract entities; their dimension k can be regarded as a model parameter 
to be estimated from the data. See Section [TTT] for an interpretation of the latent spaces.) Complex 
network dependence not addressed by the ANO(CO)VA equation is modelled through 
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for alH = 1, . . . , n , (3) 
for all j . (4) 
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Equation ([3]) stipulates that sender and receiver effects due to the same node are possibly corre- 
lated ipsr), with potentially distinct amounts of uncertainty {as and ar)', this within-node structure 
is constant across nodes. Equation (|4]) allows for potential reciprocity between i and j through 
p (constant across all (z, j)-pairs), with the typical assumption of homogeneous random errors 
through a. 

There are various advantages to the use of a statistical SNA framework when studying food 
webs: 

1. Randomness or uncertainty is naturally inherent in the network links. For example, having 
observed "i j" (no predation) does not necessarily preclude the future occurrence of "i — )■ 
j." This type of randomness is readily acknowledged and modelled through the statistical 
framework. 

2. Factors that may influence or be associated with the network links can be explicitly expressed 
as covariates in the statistical model. This facilitates the immediate understanding of what 
makes a given node the predator or the prey, and avoids post-SNA "detective work" that may 
be necessary when using non-statistical SNA methods, which provide information about who 
tends to eat whom but not why. Depending on the context, covariates in a statistical model 
often provide insight into why. 

3. Statistical inference for the random effects Si,rj,Ui, and vj provides insight into trophic 
patterns (e.g. clustering of nodes) from three different perspectives simultaneously: (i) ac- 
tivity level as prey and as predator, (ii) preference of being consumed, and (iii) preference 
of consuming; see Section [TTT] for the rationale. A single SNA that considers all three per- 
spectives provides a more comprehensive understanding of food web structure. Assessing 
uncertainty in the trophic patterns is part of the one-step inference. With equivalence class 
methods, typically one analysis is produced for each pre-defined set of criteria correspond- 
ing to a single perspective of congruence, and uncertainty assessment is applied separately 
by randomly permuting network links then re-analyzing each permutation under the given 
criteria (Krause et al, 2003). 
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4. As part of the one-step inference through a model fit, statistical inference for network de- 
pendency parameters psr and p provides further insight into trophic features that underlie the 
link patterns (see Section [TT I. 

5. The regression framework of a statistical SNA readily allows (i) prediction of network links 
under different scenarios, and (ii) assessment of uncertainty in the predictions. 

6. Since a statistical SNA is based solely on empirical information, results in certain contexts 
may be used to validate projections made based on deterministic mathematical models for 
trophic relations. 

1.1 Interpreting the statistical social network model for food webs 

Although observations on feeding behaviour may be observed at the organism or species level, 
trophic food web data are typically recorded for nodes and presented in a diet matrix, as follows: 
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For presence- absence data, yij = 1 if j is observed to predate on i, and yij = if this predation 
is not observed. For weighted data, Uij is the weight, or magnitude, of the consumption of i by 
j. For example, yij can be the total biomass of i consumed by j. Whether presence-absence 
or magnitude is considered, ya corresponds to cannibalism of i, and is not modelled by existing 
statistical SNA methods. This is because "z — )■ i" is often undefined in a typical social network 
context such as friendship or trade. Although cannibalism is not uncommon in nature, we regard it 
as a secondary or nuisance feature of the food web when intemodal trophic relations are considered 
empirically. The reason is as follows. Based purely on which node is observed to predate on which 
other node, we wish to understand primarily the relative influence of nodes on each other, instead 
of a given node's self influence. Furthermore, a model such as ([2]) depicts a temporal snapshot 
of the food web, and does not address food web dynamics in which cannibalistic behaviour may 
directly influence other nodes due to the dynamical nature of the system. An empirical analog of 
a dynamic network model is the longitudinal social network (LSN) model of Westveld & Hoff (in 
press). While potentially valuable in principle for food web research, longitudinal statistical SNA 
often may be infeasible in this context due to the lack of repeated field observations of the same 
food web over time. Consequently, in this report we focus on the non-temporal empirical aspect 
of food webs while ignoring cannibalism. 

Despite the limitations of the statistical SNA framework, it can provide valuable insight into 
the food web structure through its greatly interpretable model parameters. To see this, consider 
(|2])-(|4]). Given Node i, the bivariate random effect [sj, rj]' describes the level of feeding activity 
of the node, after adjusting for covariate effects. Feeding activity for i is related to its in-degree 
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Q2j Vji — total activity as predator) and out-degree Q2- Uij — total activity as prey). In- and out- 
degree unexplained by covariate(s) are captured respectively by Sj and rj. A graph of the estimated 
[sj,rj]' vectors (Graph SR in Chiu & Westveld, 2010) displays the positions of nodes according 
to their feeding activity; any identifiable cluster may be considered a "trophic level" from the 
perspective of feeding activity as prey and as predator. 

Estimates of the A;-dimensional vectors Ui and Vj may also be graphed (Graphs U and V in 
Chiu & Westveld, 2010). We take A; = 2 to reduce model complexity and allow easy visualization 
of the vectors. Alternatively, k can be determined via optimization criteria, then projected onto 

for graphical display (Ward et ai, 2007; Hoff, 2005). In either case, the latent /c-dimensional 
w-space corresponds to preference of being consumed. As Chiu & Westveld (2010) explain, if Uj 
and Uj are neighbours in the ti-space, then the sending behaviour of i to £ — after accounting for 
sending activity — is similar to that of j to i, for all nodes £. In a food web, this phenomenon 
translates to i and j being similarly preferred as prey. The same interpretation applies to Vi and 
Vj being neighbours in the f -space, except for their similarity in receiving, or preference for prey. 
Thus, clustering in the i^-space suggests trophic levels with respect to preference of being prey, 
and clustering in the t;-space suggests trophic levels with respect to preference of being predator. 
Chiu & Westveld (2010) demonstrate that trophic clusters identified in the three graphs can differ 
substantially depending on the perspective from which trophic relations are viewed. For example, 
nodes which show similarity in feeding activity ([sj, ri]' vectors being closely clustered) may differ 
drastically in their feeding preference (Vi vectors belonging to separate clusters in the v-space); 
this was seen in various food webs analyzed by Chiu & Westveld (2010). 

Finally, statistical SNA not only provides information on trophic clustering from various per- 
spectives, but also on the dependency among nodes beyond the patterns of intemodal links. This 
extra insight is achieved through the statistical inference for psr and p. Consider the sign of psr, 
which is reflected by the trend of the graphical display of the estimated [sj, rj]' vectors. A positive 
trend suggests that active predators tend to be active as prey, and a negative trend suggests that ac- 
tive predators tend to be inactive as prey. Insight into the reciprocity of predation, or the tendency 
of predator-prey role reversal, is available through the inference for p. A positive p indicates that 
the predation of i by j is positively associated with the predation of jhy i, and hence, the tendency 
for predator-prey role reversal between i and j. Conversely, a negative p indicates that this role 
reversal is unlikely. 

Note that statistical social network models typically involve Bayesian inference, as mixed- 
effects models with complex dependence structure can be naturally constructed in a Bayesian hi- 
erarchical framework. Posterior inference can be made via Markov chain Monte Carlo (MCMC). 
However, posterior inference for Ui and Vj via MCMC requires careful handling of the MCMC 
samples; see Chiu & Westveld (2010). 

2 Modelling consumption data 

The discussion on the method of statistical SNA and its merits apply generally to the cases of yij 
being presence-absence or weighted data. However, weighted food web data may pose a challenge, 
due to the high incidence of yij = 0. For example, the eight food webs analyzed by Chiu & 
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Westveld (2010) each consists of between 76% and 98% zeros. Direct application of existing 
statistical SNA techniques to such weighted data would require a special distributional assumption 
for yij to account for its extreme point mass at and its continuous distribution away from (Figure 
[T|). For this, a mixture distribution may be appropriate, at the expense of model complexity and 
computational burden. In this report, we propose an alternative approach that does not require the 
same level of model complexity, and is reasonably straightforward to implement. 

Figure 1 : Distribution of consumption data for the Benguela food web. 
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Consider the nature of zeros in food web data. It is unlike the social context of, say, friend- 
ship in which typically (i) non-zero links between two nodes are common, and (ii) the randomness 
inherent in the links is substantial enough that under different scenarios, zero links can plausibly 
become non-zero and vice versa. In contrast, zero links often dominate a food web; given such 
a link, biological theory can easily identify the nature of this 0, such as the case of herbivorous 
grazers almost surely not consuming other animal species under any realistic scenario. Among the 
vast number of zeros, those regarded as truly random are typically no more than a handful, if any. 
For this reason, here we regard all observed zeros as structural zeros, and remove them from con- 
sideration when constructing the statistical social network model. Note that for presence- absence 
food web data, including the zeros in the logistic model (as is done by Chiu & Westveld, 2010) 
implies a different interpretation than what we propose in this report. Specifically, ([2]) regards the 
entire food web (i.e. the set of all n(n — 1) directed links) as one random entity, whereas in our 
current context, the randomness in each directed link is being modelled. As discussed above, it 
is not straightforward to consider the randomness of the entire weighted food web, for which the 
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distribution of i/ij is highly non-standard. 

We refer to the weighted food web excluding zero links as the "reduced weighted food web," 
or "reduced web" for short. To model this reduced web, some notation is necessary. Let 



^0 = {(?, j) e S* : Vij^Vji = 0} , 



S = S*\So, 



Si = e S : yij > 0, yj^ > 0} , 

'^2 = j) G S : yij > 0, yji = 0} , 
Si = e S : yij = 0, yj, > 0} . 



Note that S* is the set of all ?2-choose-2 nodes in the food web, and Sq, iSi, 1S2, 1S3 are disjoint. We 
need not consider the set Sq of unlinked pairs, since by definition of the reduced web, such pairs 
are not considered. Thus, Ul^^Sk = S. Also note that Si consists of all pairs who show mutual 
predation, ^2 consists of send-only pairs (i.e. j predates on i but not vice versa), and S3 consists of 
receive-only pairs (i.e. i predates on j but not vice versa). We additionally let 



The set X consists of all n nodes of the food web, which is broken down into three disjoint sets 
Xi, X2, and X3, where Ul^ilk = X. The set Xi consists of all basal nodes, or bottom prey who never 
consume but are predated upon by at least one other node in the web; in contrast, X3 consists of top 
predators who only consume but are never predated upon; X2 consists of the remaining "middle" 
nodes. 

For a given node, the standard statistical social network model simultaneously considers links 
into and out of the node through ([3]) and (|4]). For the reduced web, however, (|3]) only applies to 
nodes in X2, each of which plays the role of prey as well as that of predator. In contrast, nodes in 
Xi and X3 are linked to other nodes in one direction only, so that (|4]) is degenerate and reduces to 
Si being independent and identically distributed (iid) as N(0, crf ) for i E Zi and rj being iid N(0, 
cr^) for i G X3. Similarly, Q only applies to mutual predators (z, j) G Si. For ^2 or ^3, one of the 
two links is missing, so that Q is degenerate and reduces to Sij being iid N(0, a^). 

Under this reduced framework, technically Si is undefined for i E and Vj is undefined for 
i Ell. However, to facilitate the visualization of feeding activity, we arbitrarily define 



X={1,2,..., 



n} , 




Si = —Aal for all i G X; 



•3 , 



Ti = —Aal for all i eXi 



(5) 
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so that given the variability of overall feeding activity, the "random effects" in ([5]) are in fact 
constant, and are appropriately located in the far left tail of the distribution of Si and rj. Then, any 
[si, Ti]' for z G X may be displayed on the sr-plane. 

On the other hand, leaving Ui,Vi for i ^ Z2 undefined would not hinder the visualization 
of feeding preference, since we consider the u- and i;-spaces separately (as opposed to the cross 
product of the s- and r-spaces). Thus, we can consider the distribution of Ui for all i G Xi U X2 in 
the M-space, and that of Vi for all i G X2 U X3 in the w-space. As do Chiu & Westveld (2010), we 
also take /c = 2 to minimize model complexity, and take Var(Mjg) = a^^, and Var(t'jg) = cr^g for all 
i and q = 1,2, where Ui = [ua, Ui2\ and Vi = [va, Vi2\'. 

3 Model implementation for the Benguela food web 

A well-studied food web is that of the Benguela ecosystem, originally discussed by Yodzis (1998) 
but further studied by, e.g. Chiu & Westveld (2010) and Dunne et al. (2004). 

3.1 Data 

The diet matrix in Yodzis (1998) consists of diet percentages for 29 nodes (Table[T]), and is accom- 
panied by several relevant variables: adult individual body mass, annual harvest, carrying capacity, 
ingestion factor, and population biomass. The (z, j)th cell of the diet matrix represents the percent- 
age of j's diet through consuming i. Thus, each column of the diet matrix necessarily sums to 100 
if all organisms in the actual food web are represented in the diet matrix. However, this appears 
not to be the case, as column sums range from 90 to 131.5 for non-basal nodes. 

To derive weighted data that corresponds to consumption volume, first we scale the diet per- 
centages according to the column sums so that each scaled column for a non-basal node sums to 
100. The columns for the two basal nodes necessarily consist of all Os and remain unaltered. Next, 
we use the scaled diet proportions to define two different measures of consumption volume: 

(i, j)th population consumption = (j's population biomass) x 

(j's ingestion factor) x 

(j's diet proportion due to i) 
{i, j)th per-adult consumption = (j's adult individual biomass) x 

(j's ingestion factor) x 

(j's diet proportion due to i) 

where j's ingestion factor is the "fraction of physiologically maximal ingestion" (Yodzis, 1998) 
for j. Both definitions refer to the biomass of i consumed by j. We take 

Vij = ((«,j)th population consumption) 20 (6) 
or ?/y = ((i,j)th per-adult consumption) 10 . (7) 
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As can be seen in Figure [T| the 20th-root transformation in (|6]) results in a reasonably Gaussian 
distribution after the removal of links. The same is true for the lOth-root transformation in Q 
(not shown). The objective is to express either definition of i/ij as the response of covariates in the 
reduced statistical SNA framework. Although these transformations appear to be drastic and not 
easily interpretable, having approximate normality eliminates extra model complexity that would 
have been needed to address non-normality, given an already complex framework for modelling 
network dependence. 



3.2 Bayesian hierarchical model 

For these data, only 196 out of the 29x28 = 812 pairwise links are non-zero. The sets S3 = I3 = ^ 
as there are no receive-only nodes. The cardinality of Si,S2,Ii, andX2 are, respectively, 5 (see the 
illustration on the cover of this report), 186, 2, and 27. Thus, the statistical social network model 
comprises the following set of distributional statements: 



Vij 



i^ij) Si J Sjj Tij Tjj Uij Ujj Vij Vjj f2 

BVN 



P-ij 
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Si 
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"^3 


+ 








Ti 











(8) 



n) forall(z,j) 

for all e S2 



Uij Si, rj, Ui, Vj,a N{jjij + Si + rj + u/vj, a"^ 

Hij = x'ijf3 forall (z,j) G 5, 
Si\as ~ N(0, cr^) , ri\ar = —Aar for all i E Ii , 

Si 

ri 

2 



(9) 



S ~ BVN(0, S) for all i e I2 , 
N(0, alg) for alH G X and g = 1, 2 , 
for all i EX2 and g = 1, 2 , 
where Xij is the vector of covariate values for pair and /3 is the corresponding regression 



Uig\<^uq ~ 



Viq\0-vq ~ N(0,a^ 



coefficient. The choice of x,i is discussed in Section 3.3 



For Bayesian inference of complex models, the mixing of MCMC draws is often a practical 
concern. To mitigate mixing difficulties, first note that ([9]) implies 



N(0,a 



sJ 5 



ri\si ~ N(Asi 



Ps 



A- 



(10) 



To see this, rewrite the last expression of ( 10) as A = psrCrr/cTs- Hence, the 2nd expression of ( 10) 
implies 0^ = (1 — pl^)al. These expressions for A and 0^ imply 



2 

Psr 



A' 



which in turn implies the third expression in (10). The use of (10) avoids generating MCMC 
samples of S from a matrix distribution such as Wishart, which caused major mixing difficulties 
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in our case. Mixing issues associated with Wishart priors are also discussed in The BUGS Project 
frequently asked questions. 

We consider reasonably diffuse proper priors in the Bayesian hierarchy. The diffuseness re- 
flects our lack of prior knowledge of parameter values. Let T(a, b) denote the Gamma distribution 
parametrized in such a way that small values of a and b lead to diffuseness. We take 

/^=^^' ^~N(0,0.822), (11) 
A,/3^ ~N(0,a-^) foralH = 0,1,...,L, (12) 

1-2 -2 -2 



^ (^s ^ ^ ^^J, ~ r(a, a) for all g = 1, 2 



where L is the number of covariates, and a < 0.01 (small a leads to diffuseness). The actual choice 
of a varied in our analyses and was found to have virtually no influence on the results. Expression 



(11 1 employs the Fisher transformation to avoid taking p ~ U[— 1, 1] so as to improve MCMC 



mixing; the choice of distribution for z corresponds very closely to p ~ U[— 1, 1]. 



3.3 Covariates 

From the variables in Yodzis (1998) that accompany the matrix of diet percentages, we consider 

= sender adult individual biomass, 
t^2 = annual harvest of sender, 

= receiver adult individual biomass, 
fj2 = annual harvest of receiver. 

Note that t^s are sender-specific covariates, and f^s are receiver- specific. In addition, we create a 
pair- specific covariate 

if, = taxonomic distance 

by comparing the taxonomic classification of i and j according to their domain, kingdom, phy- 
lum, class, order, family, genus, and species. This is the distance counterpart of the conservative 
phylogenetic similarity measure Zij of Chiu & Westveld (2010). We do not consider carrying 
capacity as a covariate, as it is highly related to the notion of ingestion factor, which is used to 
define the response variable. Altogether, we have five covariates, all of which are non-negative. 
A log-transformation is applied to all five, which substantially reduces skewness of the distribu- 
tions. Covariates that have values are shifted up by the value 1 before taking log, and thus 
non-negativity is preserved. 

Next, each log(t) is centered to reduce dependence among the corresponding regression coef- 
ficients in the Bayesian inference. We assume this dependence to be negligible, as is reflected by 



10 



( 12 1. Finally, the covariate vector Xj, = [1, . . . , Xij^]' for L = 5 is 



ij — [J-, 


1/ J? 

Xijli • ■ ■ 1 Xijlj\ iOI 




centered log(t^^) , 


Xij2 


centered log(t|]^) , 




centered log(t|2) , 




centered log(t^]^) , 




centered log(t^2) ■ 



(13) 



3.4 MCMC implementation 



The model in Section 3.2 for the Benguela food web, with the covariates from Section 3.3 
implemented in OpenBUGS 3.0.3 (Thomas et al, 2006). 
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4 Results 

4.1 Models for population consumption 

We first fit the reduced model as described in Section |3j with the population consumption defined 
in ([6]) as the response on all five covariates. We monitor the convergence of the MCMC algorithm 
for all unknown parameters in the model, including fixed and random effects. Marginally, MCMC 
samples for each parameter converged properly except for Ui even after more than four million 
iterations. Non-convergence can be due to unidentifiability and/or insufficient MCMC iterations. 
We observe that Node 1 being a basal node can be a potential source of unidentifiability, although 
convergence of marginal MCMC samples for U2 is achieved for Node 2, which is also basal. We 
continue to investigate the source of this problem. 

Aside from non-convergence associated with Ui, we also observe that the variability of the 
MCMC samples of ui and Vj (after Procrustes transformation; see Chiu & Westveld, 2010) is very 
substantial for alH 7^ 1 and j 7^ 1, 2. This is illustrated in the bottom panels of Figure[2} where two 
posterior distributions overlap greatly despite their means being farthest apart from each other in 
the u- or i;-spaces. The extent of variability we observe here is very unlike that observed by Chiu 
& Westveld (2010) who fit the logistic social network model to presence-absence data including 
all Os. Here, we contend that inference is inadequate for such a large parameter space when only 
24% of all n{n — 1) links are modeled. Despite this, some insight may still be gained from the 
distribution of nodes in the u- and f -spaces according to the posterior means. For example. Nodes 
15 and 16 are deemed to be most differently preferred as prey with respect to how much they are 
consumed; Nodes 1 1 and 13 are deemed to have the most different preference for prey with respect 
to how much they consume. Though, these differences have substantial uncertainty. 

Due to non-convergence of the fit for the model with u'-Vj, we remove this interaction term 
from ([8]) and refit the model. (All subsequent models discussed in this report involves no interaction 
term.) MCMC samples for the entire joint posterior distribution converge. Results are shown in the 
top panels of Figure [3j The top left panel shows the distribution of nodes according to the posterior 
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Figure 2: Results of fitting the reduced model in Section [3| taking i/ij as population consumption 
defined in ([6]). Top two panels: plots of the Benguela food web (arrows point from prey to predator) 
with node label i positioned at the posterior mean of Ui for i ^ I (left) and of Vi for i ^ I, 2 
(right). Bottom left: MCMC samples from the posterior distribution of Ui for ? = 15 (black) and 
16 (gray). Bottom right: MCMC samples from the posterior distribution of for i = 11 (black) 
and 13 (gray). 
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mean for feeding activity, after adjusting for covariates. The two basal nodes (1 and 2) are clearly 
distinct from the rest of the food web. The top right panel shows the MCMC samples for [sj, r^]' for 
i = 1 in black, i = 2m red, i = 9 in blue, and z = 23 in purple. The graph indicates little overlap 
among the posterior distributions of these four nodes. This suggests that they all show statistically 
distinguishable feeding activity with respect to consumption volume that is non-zero. It can be 
deduced that the posterior distributions of the remaining nodes fall between those for Nodes 9 and 
23, and may not be highly statistically distinguishable from each other. Distinguishability of the 
posterior distributions for feeding activity may be used to identify trophic clusters or trophic levels, 
which can facilitate subsequent steps in a typical food web analysis, such as the identification of 
keystone species. We intend to investigate the use of formal distance measures for multivariate 
distributions for this purpose. 

The reader should recall that due to the removal of all yij = 0, the model fit should be inter- 
preted differently than the typical statistical SNA. Thus, the comparison of feeding activity among 
nodes is made by considering true activity only, so that non-activity is not part of the comparison. 
Indeed, the sr-graph for this reduced model shows that biomass sent and biomass received are 
positively correlated (disregarding Nodes 1 and 2), which suggests that those nodes that are more 
actively consumed (by volume) are active consumers (by volume) themselves. This is also seen 
in the credible interval for given in Table [2} although the evidence is not strong (the credible 
level is only 65% for an interval that is to the right of 0). Table [2] also indicates that the ten- 
dency of predator-prey role reversal is high (p > 0) with strong evidence (a 90% credible interval 
for p is to the right of 0). In the conventional food web context, these positive correlations may 
be counterintuitive, since, as suggested by the analyses of Chiu & Westveld (2010), more active 
predators (e.g. sharks) are typically less likely to be consumed by other organisms, and the prey 
of a given predator (e.g. bird) is unlikely to consume the predator in return. However, the conven- 
tional context regards non-activity as a description of activity in general, while this is not so under 
the reduced statistical SNA framework. Specifically, when non-activity is ignored in modelling 
consumption volume as the response, "predator-prey role reversal" refers to the non-zero volume 
of predation activity being reciprocated. 

We can also use Table |2] to assess the statistical relevance of the five covariates in the social 
network model. We see that (3^ is the only regression slope parameter with more than a 0.5 posterior 
probability (credibility) for it to be in some interval that excludes 0, yet even this credibility is low 
(0.55). This suggests that predator biomass is very marginally relevant to explaining non-zero 
population consumption, and the other four covariates are essentially irrelevant. Again, the overall 
weak relevance of covariates should not be regarded as contradicting the conventional context of 
feeding that includes non-consumption (e.g. Chiu & Westveld, 2010, show that taxonomic distance 
is relevant to the presence / absence of a feeding hnk). 

4.2 An attempt to address the question of seal cull 

The weak credibility of non-zero slope parameters might also be a result of fitting a complex model 
to insufficient data. To investigate this, we refit the model with various subsets of the covariates, 
e.g. that with a single covariate, predator harvest. This choice of covariate is due to the scientific 
interest of the potential benefits of a seal cull to future commercial harvest of fish species upon 
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Figure 3: Results of fitting the reduced model in Section[3]but removing u\vj from ([S]), and taking 
Uij as population consumption defined in (|6]). Top left: plot of the Benguela food web (arrows point 
from prey to predator) under the five-covariate model, with node label i positioned at the posterior 
mean of [sj, r^]'. Bottom left: same as top left but with predator harvest as the sole covariate. Top 
right: MCMC samples from the posterior distribution of [sj, rj]' for i = 1 (black), 2 (red), 9 (blue), 
and 23 (purple), under the five-covariate model. 
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Figure 4: Scatter plot of non-zero population consumption ([6]) versus transformed predator harvest 
( 13 1. Points in red correspond to seals as predators. 




which African fur seals predate (Yodzis, 1998). In this case, we take Xij^ to be the only covariate 
in the reduced statistical SNA framework, with again from ([6]). Indeed, the model with predator 
harvest as the single covariate has the smallest deviance information criterion among the models 
we have investigated, suggesting a reasonable balance between fit and model parsimony. 

The bottom panel of Figure |3] shows the fitted feeding activity under this one-covariate model. 
By and large, the results are highly comparable to those of the five-covariate model, with a minor 
shift in relative position of the posterior mean of [sj,rj]' for certain nodes. For example, under 
this submodel. Node 27 appears closer to Nodes 5, 19, and 21 in feeding activity. The marginal 
posterior distribution of [sj,rj]' for i = I, 2, 9, 23 for the one-covariate model (not shown) also 
appears very similar to that for the larger model (Figure |3| top right). 

The credible intervals for the three parameters of interest from the one-covariate model are 
given in Table |3j Their reasonably high credibility here suggests that the weak credibility seen for 
the larger model in Section [4~T] may well be due to insufficient data. Note that the high credibility of 
(3^ > indicates the relevance of predator harvest to explaining or predicting non-zero population 
consumption. The positive trend is noticeable in the plot of non-zero yij against Xjjs, shown in 
Figure |4j To investigate the relevance of predator harvest at the node level instead of the food web 
level, we then fit a model with one random slope per node as receiver, so that 

Vij = /3o + (3j5Xij5 + Si + rj + Eij , /3j5|/35, cr/35 ~ N(/35, a^g) . 

For this random-slope model, positive credible intervals for (3j^ with a credibility of at least 70% 
are also observed for many non-basal nodes j including seals (in red in Figure |4]), while the other 
non-basal js correspond to a credibility of 50% or less for intervals excluding 0. 
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In the causal context, a positive credible interval would be evidence that increasing the harvest 
of a predator would increase the total consumption of prey by the predator, and hence, decrease 
the total availability of prey species. This reasoning is obviously counterintuitive as far as theo- 
rized population dynamics are concerned. Of course, our regression models are fitted to entirely 
empirical, observational data, so that causal conclusions may be inapplicable. The positive associ- 
ation here between non-zero yij and x^^ reflects the practice that predator populations with a high 
consumption of prey biomass tend to be harvested more heavily. 

A more direct approach to address the potential influence of a seal cull on the food web without 
employing population dynamics would be applying the LSN model (Westveld & Hoff, in press) to 
a food web that is repeatedly observed over time. Although still observational and non-causal, a 
temporal model would incorporate any temporal fluctuations in the relationship between predator 
and prey. Given observations made at regular time intervals over a substantial time period, such 
fluctuations should reflect the underlying population dynamics. For example, time points at which 
the seal population biomass increases is expected to correspond (subject to natural variability) to 
the time points at which the total consumption of prey species of seals also increases, and vice 
versa. Then, to assess the statistical significance of the influence of a seal cull (via increased seal 
harvest) on the availability of commercial fish species which are seals' prey, one may perform a 
posterior predictive analysis after fitting the LSN model. Such an analysis is achieved by determin- 
ing the posterior predictive distribution for yij, given reduced values of seal population biomass as 
covariate values, where j = seals and i corresponds to a prey species of interest. Any statistical 
significance is due entirely to empirical information through the temporal data for consumption 
and covariates for the food web, and could serve as validation of projected phenomena based on 
population dynamics theory. 

However, given the rarity of temporal food web diet data, an entirely empirical approach for 
validating population dynamics may be challenging. Proxy data based on stable isotopes (Jennings 
et al, 2002; Pinnegar & Polunin, 2000) that are repeatedly collected over time do exist, e.g. the 
South East Fishery Ecosystem Study conducted by the CSIRO Division of Fisheries from 1993 
to 1996. CSIRO Marine and Atmospheric Research (CMAR) continues to conduct studies that 
collect stable-isotope data for food web research. We intend to pursue statistical LSN modelling 
of these temporal stable-isotope data in the future. 

4.3 Models for per-adult consumption 

Although the five covariates as a whole show little relevance to population consumption, we also 
investigate their relevance to per-adult consumption defined by (|7]). We consider two models: one 
with all five covariates, and one with no covariates. Results are shown in Figure [5] and Table |4j 
Comparing Figures |3] and |5| we see that trophic clustering according to feeding activity depends 
on whether feeding consumption is considered on a (i) per-adult basis or (ii) per-population basis. 
Specifically, relative positions of nodes on the sr-plane are very different between (i) and (ii). 
The graph of posterior means for (i) shows four relatively distinct clusters of nodes, positioned 
nearby Nodes 1, 4, 16, and 23, respectively. However, the posterior distributions that correspond 
to these nodes indicate that the two clusters near Nodes 16 and 23 are only weakly statistically 
distinguishable due to substantial overlap. Moreover, all clusters show very similar sending activity 
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according to the dispersion on the s-scale. In contrast, the two graphs for (ii) show that there are at 
least four nodes that are highly distinguishable, and they are not all similar with respect to sending 
or feeding activity. 

Let us return to per-adult consumption. With all five covariates or none in the model, the poste- 
rior distribution of [sj, rj]' remains largely unchanged. The same is true for dependency parameters 
Psr and p. However, little change in these regards is not an indication of the irrelevance of co- 
variates to consumption volume. In fact, all covariates except taxonomic distance correspond to 
credible levels of 70% to 99.5% for intervals that exclude 0; predator biomass shows the strongest 
statistical relevance. This contrasts substantially with the weak relevance of covariates to pop- 
ulation consumption. Also different between the two definitions of consumption is the sign for 
/^s, which is negative for per-adult consumption. Although still a non-causal relationship, (3^ < 
suggests that those species that are more heavily harvested tend to consume less on a per-adult 
basis and vice versa, after adjusting for the five covariates. Irrespective of the sign of /^s, the non- 
temporal nature of the data makes it impossible to use the current statistical SNA to address the 
effects of a seal cull on commercial fisheries production. 

5 Discussion 

In this report, we have proposed a statistical social network modelling framework for weighted 
food we data. With substantial data, this framework allows us to examine trophic relations from 
the perspectives of (i) feeding activity, (ii) preference of being prey, and (iii) preference for prey, 
all with respect to consumption volume on either a per-individual or a per-population basis. This 
complements the existing framework for 0- 1 data applied to food webs by Chiu & Westveld (2010), 
who consider (i) to (iii) with respect the action of sending and receiving irrespective of volume. 

There are limitations to our framework for modelling weighted food web data. Indeed, the re- 
moval of Uij = from consideration is not ideal. It drastically reduces the amount of available data, 
and consequently, inference for feeding preference can be weak. Furthermore, phenomena such 
as Psr > and p > are difficult to interpret in this unconventional context of trophic links. As 
Dr. Beth Fulton of CMAR pointed out at the 2010 CMAR Trophodynamics Workshop in Hobart, 
Tasmania, the existence of true zeros for links between pairs is often one of the most important 
aspects of food web research. Before a more innovative statistical framework is available to model 
weighted food web data that are heavily dominated by Os, we have presented the reduced statistical 
SNA technique in this report to serve as a compromise between relying solely on presence- absence 
data, and modeling weighted yij > with an inappropriate distributional assumption for yij . 

Currently, we are considering the following mixture model to incorporate the point mass of yij 
atO: 

Pij = P{yij > 0) , follows 1^, y*j \ {y^j > 0}, x^j, (3\ a* ~ N{x[j/3* ,a*^) 

where y*j is the appropriately transformed value of to achieve normality, and /3* and a* axe 
parameters at the level of y*j (not the level of pij). 

Variants of this model, including one with random slope parameters per node, are also being 
considered. When implemented, the model is expected to mitigate several existing difficulties 
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Figure 5: Results of fitting the reduced model in Section [3] but removing u\vj from ([8|, and taking 
Uij as per-adult consumption defined in Top left: posterior mean of [si,rj]' under the five- 
covariate model. Bottom left: same as top left but with no covariates. Top right: MCMC samples 
from the posterior distribution of [sj, r^]' for i = 1 (black), 4 (red), 16 (blue), and 23 (purple), under 
the five-covariate model. 





18 



associated with the reduced framework for weighted data that we have proposed in this report. 
However, new computational challenges may arise due to increased model complexity. We intend 
to tackle this extended model and corresponding difficulties in a future report. 

We also intend to address research questions that pertain to population dynamics, such as the 
one regarding seal cull. For this, we expect to rely on adapting the statistical LSN framework of 
Westveld & Hoff (in press) to the food web context for modelling temporal stable-isotope data. 
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Table 1 


: Nodes of the Benguela rood web. 


Node 


Orgamsm(s) Notes 


1 


Phytoplankton Basal 


2 


Benthic filter-feeders Basal 


3 


Bacteria 


4 


Benthic carnivores 


5 


Microzooplankton 


6 


Mesozooplankton 


7 


Macrozooplankton 


8 


Gelatinous zooplankton 


9 


Anchovy 


10 


Pilchard 


11 


Round herring 


12 


Lightfish 


13 


Lanternfish 


14 


Goby 


15 


Other pelagics 


16 


Horse mackerel 


17 


Chub mackerel 


18 


Other groundfish 


19 


T T 1 

Hakes 


20 


Squid 


21 


Tunas 


22 


Snoek 


23 


Kob 


24 


Yellowtail 


25 


Geelbek 


26 


Whales and dolphins 


27 


Birds 


28 


Seals 


29 


Sharks 
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Table 2: Credible intervals for parameters of interest in the reduced model in Section[3]but remov- 
ing u'jVj from ([8]), and taking Uij as population consumption defined in ([6]). Credible levels above 
0.5 are presented in bold. 







v^rcciiDic inicrvdi 
T nwpr limit TTnnpr limit 


r^rpHihlp IpvpI 




5.0x10-^ 


(interval includes 0) 


0.50 


(taxo. dist.) 










-1.5x10-=^ 


(interval includes 0) 


0.50 


(prey biomass) 










-8.0x10"^ 


(interval includes 0) 


0.50 


(prey harvest) 










2.1 xlO^^ 


0.000 0.004 


0.55 


(predator biomass) 










6.2x10-^ 


(interval includes 0) 


0.50 


(predator harvest) 








Psr 


0.353 


0.017 0.696 


0.65 


P 


0.329 


0.069 0.927 


0.90 



Table 3: Credible intervals for parameters of interest in the reduced model in Section |3] but taking 
Xij5 as the sole covariate and removing u[vj from ([S]), and taking yij as population consumption 
defined in (|6]). 



Parameter 


Posterior mean 


Credible Interval 
Lower limit Upper limit 


Credible level 




0.003 


0.000 0.005 


0.90 


Psr 


0.305 


0.017 0.696 


0.90 


P 


0.506 


0.043 0.761 


0.70 
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Table 4: Credible intervals for parameters of interest in the reduced model in Section [3]but remov- 
ing u\vj from ([8]), and taking yij as per- adult consumption defined in Parameters without '*' 
correspond to the five-covariate model; those with '*' correspond to no covariates. 



Parameter 


Posterior mean 


Credible Interval 
Lower limit Upper limit 


Credible level 




0.003 


0.000 


0.006 


0.50 




-0.007 


-0.014 


-0.000 


0.85 




-0.004 


-0.010 


-0.000 


0.75 


Pa 


0.139 


0.001 


0.027 


0.995 




-0.005 


-0.009 


-0.000 


0.70 


Psr 


0.343 


0.079 


0.989 


0.70 


Psr 


0.436 


0.130 


0.994 


0.75 


P 


-0.145 


(interval includes 0) 


0.50 


P* 


-0.090 


(interval includes 0) 


0.50 
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